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It is tacitly accepted that, for practical basis sets consisting of N functions, solu¬ 
tion of the two-electron Coulomb problem in quantum mechanics requires storage of 
O(N^) integrals in the small N limit. For localized functions, in the large N limit, 
or for planewaves, due to closure, the storage can be reduced to O(N^) integrals. 
Here, it is shown that the storage can be further reduced to for separable 

basis functions. A practical algorithm, that uses standard one-dimensional Gaussian- 
quadrature sums, is demonstrated. The resulting algorithm allows for the simulta¬ 
neous storage, or fast reconstruction, of any two-electron coulomb integral required 
for a many-electron calculation, on each and every processor of massively parallel 
computers even if such processors have very limited memory and disk space. For 
example, for calculations involving a basis of 9171 planewaves, the memory required 
to effectively store all coulomb integrals decreases from 2.8Gbytes to less than 2.4 
Mbytes. 
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I. INTRODUCTION 


In this communication a workable algorithm is derived and presented that allows each 
processor to store all information required to quickly look up any two-electron integral, 
involving four basis functions, needed for either density-functional or multiconfigurational 
wavefunction methods. The method is demonstrated by applications of a uniform electron 
gas, confined to a cubic box, for electrons with wavevectors that are enclosed in a Fermi 
sphere. 

Strategies for rapid calculation or efficient storage of two-electron integrals, for density- 
functionali'^ calculations, or multiconhgurational active space methods^*^ continue to evolve 
as different mathematical techniques and different types of computing platforms arise and 
as different types of basis functions are implemented for use in electronic structure calcu¬ 
lations. A recent comprehensive review of these efforts by Reine et aP- includes discussions 
of least-square variational fitting methods^i^ and Rys polynomials.- Other methods such as 
direct methods,- analytic algebraic decompositions,— tensor hypercontraction^ and multi¬ 
pole methodsi^ are also widely used. Many of these methods support the hypothesis that 
the space of two-electron integrals is smaller than naively expected. 

This paper seeks to formally prove, for separable functions used in electronic structure 
calculations, that the set of information on which the Coulomb integrals truly depends 
is much smaller than expected from a permutational analysis. Further a practical approach 
is developed and applied to the uniform electron gas. The algorithm is based upon a 
three-dimensional Fourier transform, a one-dimensional Laplace transform, an additional 
one-dimensional integral transform, and the use of Gaussian quadrature. The storage re¬ 
quirements needed to calculate matrix elements associated with the coulomb operator is 
reduced to for either planewaves or Gaussians. 

Another motivation for this work is that the development of massively parallel methods 
requires one to break a problem up into many independent subtasks that can then be per¬ 
formed simultaneously by a large number of computer processors.— To achieve high efficiency 
on massively parallel architectures, it is necessary to ensure that the amount of information 
exchanged between processors is small and that the rate of information exchange is intrinsi¬ 
cally faster than the computing time used by any processor. For future low-power computing 
platforms it is desireable, if not expected, for each processor to have a very limited amount 
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of computer memory. Thus, in reference to many-electron quantum mechanics or density 
functional theory,-!!^”— it is appropriate to reconsider whether there are other means for 
reconstructing matrix elements that might be more efficient on modern massively parallel 
architectures. For such systems it would be ideal to allow each processor to quickly recon¬ 
struct any possible coulomb integral needed for a quantum-mechanical simulation without 
information transfer to or from other processors. 

II. DERIVATION 

There is one important aspect of this derivation that appears to be universally correct 
for many, possibly all, choices of separable basis functions and that is dehnitely correct 
for planewave and Gaussian basis functions. Therefore some general considerations are 
discussed before moving the focus of this paper to applications within planewave basis sets. 
Given a set of inhnitely differentiable and continuous one-dimensional functions, labeled as 
f/(a;), it is possible to create three-dimensional basis functions gi{r) according to: 

9i{r) = fi{x)fm{y)fn{z) =YlfiAx), ( 1 ) 

X 

with I = {l,m,n). Gommon examples of such basis functions include planewaves inside a 
box or unit cell or products of one-dimension Gaussian functions which generally also have 
separable polynomial prefactors. In the former case one generally uses all possible products 
subject to the constraint that ^|I| < kc and then seeks convergence by performing the 
calculation as a function of the cutoff wavenumber {kc)- Assuming one chooses a total of 
N three-dimensional basis functions, it is then clear that there are approximately one¬ 
dimensional basis functions for each cartesian coordinate. For simplicity, but not actually 
required for this observation, the assumption is that the same one-dimensional basis sets 
are used for each cartesian component. So, even though there are pairs of three dimen¬ 
sional basis functions, there are only one dimensional products of basis functions. For 
planewaves, the complexity is further reduced to since the product of a planewave is 

a plane wave. For Gaussians this number becomes with g a characteristic number of 

neighbors, since the product of two well separated Gaussians is identically zero. 

The matrix elements that are needed to solve the Goulomb problem in density func¬ 
tional theory or to determine matrix elements required for either Hartree-Fock or Multi- 


3 



Configurational calculations are given by 


Cijkl =< 5'i5'j|i—>= / [ d^rd^r'- — ^-—gi{r)gj{r)gj^{r)gi^{r). (2) 

|r-r'| J J |r-r'| 

However, by using a continuous Fourier transform of j^:zy 7 |, followed by a Laplace transform 
of the above equation can be written in quasi-separable form according to: 


CijkL — < 5'l5'j|l-Tllfi'KS'L > 

r — r' 


= 4:71 d^p / dV / d^f 


^g*p(r-r') 


pz 


9i{r)gj{r)gK{r')gL{r'). 


( 3 ) 

( 4 ) 


= 47ry da j d^p j d^r j gi{r)gj{r)gK{r')gi^{r') (5) 

POO 

47r / d(y I I ^x-) Jxi ^^x-) Lx) (6) 

rote roo 

= 4:71 da n dx^ Jxi d^xi Lx) dvr / da Axi^a^ Ixi Jxi Ldxi Lx)- (7) 


Eq. 4 follows from Eq. 3 by a continuous Fourier transform of l/|r — r'|. Eq. 5 follows from 
Eq. 4 by a continuous Laplace transform of 1/p^. Eq. 6 follows from Eq. 5 since all functions 
are separable. In the above equation, the nine-dimensional integral is reduced to a triple 
product. Each one of these products are composed of three dimensional integrals that is 
dehned according to: 


Axi^a, Ixi Jxi ddxi Lx) / dx I dx 




fi.{x)fjAx')fKAx)fLAx’). ( 8 ) 


For either one-dimensional planewaves or Gaussians. the above three-dimensional integral 



FIG. 1. Ratio of the exact exchange energy to the Kohn-Sham exchange energy^ as a function 
of the number of electrons placed inside a cubic box. The total number of electrons of each spin 
varies from M=7 (right-most point) to M=9171 (left-most point). For purposes of presentation the 
variable designating the number of electrons (M) is taken to be 
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FIG. 2. Time per electron, in milleseconds, on a MacBook Air, to evaluate the exchange energy 
as a function of the number of electrons. The 0(N^/^) storage approach, described in this paper, 
shows that the time required to calculate the exchange energy for the uniform electron gas increases 
quadratically as a function of the number of electrons. Construction of the look up table required 
approximately ten minutes of MacBook Air time. The size of the look up table for a basis of 
9171 planewaves requires less than 2.4 Mbytes of disk space. For 9171 planewaves, the memory 
required to simultaneously store all integrals, if an 0(4A^^)) storage algorithm is used, would be 
approximately 2.8 Gbytes. 


can be determined, as a function of a, without significant difficulty. It is possible that for 
other separable functions these integrals would be difficult to calculate. However, since in 
the worst case there are only of these integrals, one can imagine calculating them only 
once and storing them forever. This means that one only needs to find an efficient numerical 
method for performing the Laplace integral in Eq. 6. From this standpoint, an observation 
that is absolutely key to capitalizing on this quasi-separable form is that by integrating 
the above expression (Eq. 8) over the a-dependent part of the, now, two dimensional 
integral, can in principal, be reduced to products of quantities with the following form: 


exp{- 


4a 


a 


— n 

— ^n=0^ri 


{x — x') 


l\2n 


( 9 ) 


with ttn = (—l)"'/n!. Therefore, for a large enough value of Oc, it follows that Eq. (7) may 
be rewritten, to any desired precision, according to: 


Cl 


IJKL 


== dvr 


da A^{a, 4, J^, JC, L^) + 47rS^or„(I, J, K, L) 


da 


2 


( 10 ) 


In the above equation the T^ are hard-to-determine constants that depend upon the func¬ 
tional form of separable basis sets, the Taylor expansion coefficients, a„, in Eq. (9), a lot 
of really complicated algebra, triple products of two-dimensional integrals associated with 
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Eq. (8), and the collection of common coefficients of arising from the occurrence of 

triple summations associated with each cartesian coordinate. It would be algebraically dif- 
hcult and computationally inefficient but not impossible to calculate these numbers. How¬ 
ever, for the purpose here it is only necessary to know that the valne of r„ could, 
in principle, be found and to accept that knowledge about the asymptotic power 
law associated with the Laplace integrand provides very important information 
about how to numerically evaluate the integral which extends to infinity. To 
make further progress, the second term in the Eq. 10 is temporarily rewritten by making 
the substitution t = and dt = -h-^. This leads to: 

C'ijKL = 47r [ ^ daY\A^{a,I^,J,,K,,L,) + 27rE^^^r4l,J,K,L) f'^^t^^dt. (11) 

Jo 2 , Jo 

Now, since both dehnite integrals are to be evaluated over a hnite interval, these integrals 
can be evaluated using Gaussian-quadrature or other one-dimensional numerical integration 
meshes according to: 


C*iJKL Ixi Jxi Kxi Li 


3/2 

+4^i:« „r„(i, J, K, 

a/ 


( 12 ) 


In the above expressions the two sets of Gaussian-quadrature weights and points, wu^ a* 
and tC 2 i, ti depend only on the choice of cKc and methods and codes for choosing these points 
are widely available and well known.— A back transformation of the right-hand sum, 
obtained by setting ^ and dehning Gj = ■, the integral collapses to the 

original recognizable form: 


C'lJKL [O, crj Axipii^ /j,, Kxi J-'x') 


(13) 


+4,rEe,aE“„r„(I,J,K,L) 


n+i 


a, 


With a suitable redehnition of notation for the volume elements and the recognition that the 
second term includes a summation which is exactly equal to n. Axi^Oiiy Ix) Jx) Kxi Lx')y the 
Laplace integral is reduced to quadratures over products of three one-dimensional integrals 
(Eq. 8). Here, it is emphasized, that Eq. (7) could have been immediately written in terms 
of numerical integrals. However the analysis followed allows one to determine how the 
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asymptotic form of the integrand scales so that the particular case of Gaussian quadrature 
methods, that are amenable to numerical evaluation of polynomials over hnite intervals, 
may be used for performing the integrations. As written, it has been demonstrated that one 
needs to store at most one dimensional integrals to reconstruct any of the integrals. 
Based on past usage of quadrature methods, it is reasonable to expect that one can perform 
multiscale numerical one-dimensional integration, such as the Laplace transformation here, 
with approximately 30-100 sampling points.— 

Gijkl= J^, L^) (14) 

X 

While the results discussed here are a factor of 2-4 away from this goal, it is likely that 
the number of sampling points can be signihcantly decreased by determining the value of 
«c which allows for the most efficient numerical integration, by breaking the a integral 
(Laplace transformation) into more than two intervals, and/or by using techniques similar 
to the variational one-dimensional exponential quadrature methods of Ref.— For example, 
a quadrature mesh constructed to integrate polynomials of rather than x, would be 
twice as efficient as the standard Gaussian quadratures meshes. Except for the clear need 
to exploit the t = \l\foi transformation for the hnal interval that extends to cxd, Ending 
the best quadrature sums are expected to depend on the form of the separable functions 
being employed. Here, for simplicity and reproducibility by others, only standard Gaussian- 
quadrature methods, with Oc = 1, are used. 

III. REDUCTION OF STORAGE TO FOR PLANE WAVES: EXACT 

EXCHANGE FOR THE UNIFORM ELECTRON GAS 

For planewaves, the product of the one-dimensional functions reduce to a product 

of two one-dimensional planes waves which is itself a planewave. If one starts with 
one dimensional planewaves (e.g. fi = exp{i2lTT/L), the products will only provide 2A^/^ 
plane waves. Therefore the number of one-dimensional integrals that are required is reduced 
to 4A^/^. As a simple application, the M-dependence of the exact exchange energy of an 
unpolarized gas of 2M electrons in a box with hnite volume (V=LxLxL) is determined in this 
section. As M gets very large, the exchange energy will converge to the Kohn-Sham value 
of Eks = —(3/4)(6/7r)^/^MA3/2,. it is also easy to verify based on scaling arguments that 
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for any number of planewaves placed inside such a box, the exact exchange energy will scale 
a /5(M, {gk})/L with f3 depending on the occupations {q^} as a function of wavevector and 
the number of electrons M placed in the box. Here to validate the numerics, the standard 
choice of occupation numbers are taken to be unity for all planewaves enclosed in a Fermi 
sphere of various radii. The radii, or Fermi wavevector, are chosen so that there are shell 
closings in reciprocal space. 

For a finite system, it is possible to fully occupy a Fermi sphere for a well defined cutoff 
wavevector if one chooses M= 7, 33, 123, 257, 515, 925,1419, 2109, 3071, 4169, 5575, 7153, 
or 9171 electrons of each spin. In Fig. 1, the ratio of the exact exchange energy to the Kohn- 
Sham energy is presented as a function of 1/M^/^. In the large M limit, it is evident that 
this ratio converges linearly to 1. This indicates that all the integrals are being performed 
accurately. In Fig. 2, the time required per electron, as a function of the total number of 
electrons, is shown. For cases where each KS-orbital is identically equal to a planewave the 
time required for the calculation of the exchange (or coulomb) interaction scales as the square 
of the number of electrons. For 9171 electrons, the Hartree-Fock exchange energy can be 
calculated in four seconds on a MacBook Air. In Table I, the convergence of the Hartree-Fock 
energy for M=9171 parallel spin electrons is shown as a function of Gaussian-quadrature 
mesh. For purposes of reproducibility, the first mesh is determined by Q quadrature points 
on the interval between 0 and 1. These points, designated by {ai^wu} in Eq. (12) are then 
transformed as described above to reduce the calculation of each exchange integral to the 
form shown in Eq. 14 (e.g. a total of 2Q mesh points for the two intervals). The results 
show that with standard quadrature methods, and an overly simple tesselation into only 
two sub-intervals, it is difficult to efficiently converge the energy due to sharp structure 
near a = 0. However, as shown in the right-most columns, if one further breaks the first 
interval into sub-intervals dehned by [0,1/5^], [1/5’^, 1/5®],..., [1/5,1] and then uses 5-, 10-, 
and 15-point quadrature meshes in each of these sub intervals, convergence of the energy for 
M=9171 electrons is achieved. 

To summarize, this paper provides a practical and systematically improvable algorithm 
that reduces the storage required for the coulomb integrals to for the special cases 

of basis sets that are commonly used in electronic structure calculations. For the case 
of planewave calculations, it is only recently that researchers have begun to entertain the 
possibility of performing multiconfigurational corrections using such basis sets. The results 


Mesh 1 (Q) 

Interval 1 

Total 

Mesh 2 (Q) 

Interval 1 

Total 

90 

0.936629 

0.965018 

8x5 

0.936961 

0.965350 

105 

0.936884 

0.965272 

8x10 

0.937043 

0.965431 

120 

0.936953 

0.965341 

8x15 

0.937043 

0.965431 

150 

0.937001 

0.965389 




180 

9.937019 

0.965408 





TABLE I. Ratio of exchange energy to Kohn-Sham exchange energy for a cube containing 
2M=18342 electrons as a function of the number of quadrature points used in Eq. 14. Mesh 1 
uses Q quadrature points on an interval between 0 and 1. Mesh 2, which breaks interval 1 into 
eight sub-intervals with geometrically varying length scales is numerically more efficient and allows 
for at least six-place precision. This suggests that the variational exponential quadrature methods, 
used for radial integrations in Ref.— may be more efficient 


of this paper significantly lower the storage reqnirements needed for either DFT, Hartree- 
Fock, or mnlti-confignrational methods based npon planewaves. Fntnre improvements of 
this method, with initial applications of the self-interaction corrections^— to the nniform 
electron gas calculations are in progress.— As compared to structurally simpler plane-wave 
methods, conversion of this algorithm for use withing Gaussian-based-orbital methodologies, 
will require a large investment of programming time but are fully expected to provide the 
same reduction of memory/disk requirements for reconstruction of the two-electron integrals. 
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